Integrating Multi-threading and Accelerators into DUNE-ISTL
نویسندگان
چکیده
A major challenge in PDE software is the balance between user-level flexibility and performance on heterogeneous hardware. We discuss our ideas on how this challenge can be tackled, exemplarily for the DUNE framework and in particular its linear algebra and solver components. We demonstrate how the former MPI-only implementation is modified to support MPI+[CPU/GPU] threading and vectorisation. To this end, we devise a novel block extension of the recently proposed SELL-C-σ format. The efficiency of our approach is underlined by benchmark computations that exhibit reasonable speedups over the CPU-MPI-only case.
منابع مشابه
The Iterative Solver Template Library
The numerical solution of partial differential equations frequently requires the solution of large and sparse linear systems. Using generic programming techniques like in C++ one can create solver libraries that allow efficient realization of “fine grained interfaces”, i. e. with functions consisting only of a few lines, like access to individual matrix entries. This prevents code replication a...
متن کاملEfficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کاملAn ITK implementation of a physics-based non-rigid registration method for brain deformation in image-guided neurosurgery
As part of the ITK v4 project efforts, we have developed ITK filters for physics-based non-rigid registration (PBNRR), which satisfies the following requirements: account for tissue properties in the registration, improve accuracy compared to rigid registration, and reduce execution time using GPU and multi-core accelerators. The implementation has three main components: (1) Feature Point Selec...
متن کاملAn ITK Implementation of Physics-based Non-rigid Registration Method
As part of the ITK v4 project efforts, we have developed ITK filters for physics-based non-rigid registration (PBNRR), which satisfies the following requirements: account for tissue properties in the registration, improve accuracy compared to rigid registration, and reduce execution time using GPU and multi-core accelerators. The implementation has three main components: (1) Feature Point Selec...
متن کاملAccelerator Exoskeleton
To maximize performance and power efficiency, future multi-core architectures may be heterogeneous, incorporating some accelerator cores alongside the IA cores. Accelerator Exoskeletons provide a shared virtual memory heterogeneous multi-threaded programming paradigm for these accelerators using novel CPU instruction set extensions and software tool chains with an Intel Architecture (IA) look-n...
متن کامل